Effect of Recognition Errors on Text Clustering

نویسندگان

  • David Grangier
  • Alessandro Vinciarelli
چکیده

This paper presents clustering experiments performed over noisy texts (i.e. texts that have been extracted through an automatic process like character or speech recognition). The effect of recognition errors is investigated by comparing clustering results performed over both clean (manually typed data) and noisy (automatic speech transcriptions) versions of the same speech recording corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Noisy Text Clustering

This work presents document clustering experiments performed over noisy texts (i.e. text that have been extracted through an automatic process like speech or character recognition). The effect of recognition errors on different clustering techniques is measured through the comparison of the results obtained with clean (manually typed texts) and noisy (automatic speech transcripts affected by 30...

متن کامل

Noise Clustering-Based Speaker Verification

The normalisation method for speaker verification proposed in this paper is based on the idea of the noise clustering method in fuzzy clustering. The proposed method can reduce false acceptance errors and apply to all current normalisation scores. Experiments performed on the ANDOSL and YOHO speech corpora show better results for the proposed method.

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

The Effect of Pictorial Flashcards on the Sight Word Recognition in Kindergartens

It was a quasi-experimental study because the study involved in training participants in twoclasses each containing about 5 to 6 years old pre-primary students. To this end, fifty studentsparticipated in the study who were studying at Misagh School in Tabriz. In order to makesure of their homogeneity, the researcher administered a pre-test. Based on the results, 40students were selected as the ...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004